<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Supported encodings [Universal Encoding Detector]</title>
<link rel="stylesheet" href="css/chardet.css" type="text/css">
<link rev="made" href="mailto:mark@diveintomark.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.65.1">
<meta name="keywords" content="character, set, encoding, detection, Python, XML, feed">
<link rel="start" href="index.html" title="Documentation">
<link rel="up" href="index.html" title="Documentation">
<link rel="prev" href="faq.html" title="Frequently asked questions">
<link rel="next" href="usage.html" title="Usage">
</head>
<body id="chardet-feedparser-org" class="docs">
<div class="z" id="intro"><div class="sectionInner"><div class="sectionInner2">
<div class="s" id="pageHeader">
<h1><a href="/">Universal Encoding Detector</a></h1>
<p>Character encoding auto-detection in Python.  As smart as your browser.  Open source.</p>
</div>
<div class="s" id="quickSummary"><ul>
<li class="li1">
<a href="http://chardet.feedparser.org/download/">Download</a> ·</li>
<li class="li2">
<a href="index.html">Documentation</a> ·</li>
<li class="li3"><a href="faq.html" title="Frequently Asked Questions">FAQ</a></li>
</ul></div>
</div></div></div>
<div id="main"><div id="mainInner">
<p id="breadcrumb">You are here: <a href="index.html">Documentation</a> → <span class="thispage">Supported encodings</span></p>
<div class="section" lang="en">
<div class="titlepage">
<div>
<div><h2 class="title">
<a name="encodings" class="skip" href="#encodings" title="link to this section"><img src="images/permalink.gif" alt="[link]" title="link to this section" width="8" height="9"></a> Supported encodings</h2></div>
<div><div class="abstract">
<h3 class="title"></h3>
<p><span class="application">Universal Encoding Detector</span> currently supports over two dozen character encodings.</p>
</div></div>
</div>
<div></div>
</div>
<div class="itemizedlist"><ul>
<li>
<tt class="literal">Big5</tt>, <tt class="literal">GB2312</tt>/<tt class="literal">GB18030</tt>, <tt class="literal">EUC-TW</tt>, <tt class="literal">HZ-GB-2312</tt>, and <tt class="literal">ISO-2022-CN</tt> (Traditional and Simplified Chinese)</li>
<li>
<tt class="literal">EUC-JP</tt>, <tt class="literal">SHIFT_JIS</tt>, and <tt class="literal">ISO-2022-JP</tt> (Japanese)</li>
<li>
<tt class="literal">EUC-KR</tt> and <tt class="literal">ISO-2022-KR</tt> (Korean)</li>
<li>
<tt class="literal">KOI8-R</tt>, <tt class="literal">MacCyrillic</tt>, <tt class="literal">IBM855</tt>, <tt class="literal">IBM866</tt>, <tt class="literal">ISO-8859-5</tt>, and <tt class="literal">windows-1251</tt> (Russian)</li>
<li>
<tt class="literal">ISO-8859-2</tt> and <tt class="literal">windows-1250</tt> (Hungarian)</li>
<li>
<tt class="literal">ISO-8859-5</tt> and <tt class="literal">windows-1251</tt> (Bulgarian)</li>
<li><tt class="literal">windows-1252</tt></li>
<li>
<tt class="literal">ISO-8859-7</tt> and <tt class="literal">windows-1253</tt> (Greek)</li>
<li>
<tt class="literal">ISO-8859-8</tt> and <tt class="literal">windows-1255</tt> (Visual and Logical Hebrew)</li>
<li>
<tt class="literal">TIS-620</tt> (Thai)</li>
<li>
<tt class="literal">UTF-32</tt> <acronym title="Big Endian">BE</acronym>, <acronym title="Little Endian">LE</acronym>, 3412-ordered, or 2143-ordered (with a <acronym title="Byte Order Mark">BOM</acronym>)</li>
<li>
<tt class="literal">UTF-16</tt> <acronym title="Big Endian">BE</acronym> or <acronym title="Little Endian">LE</acronym> (with a <acronym title="Byte Order Mark">BOM</acronym>)</li>
<li>
<tt class="literal">UTF-8</tt> (with or without a <acronym title="Byte Order Mark">BOM</acronym>)</li>
<li><acronym>ASCII</acronym></li>
</ul></div>
<a name="id667094"></a><table class="caution" border="0" summary="">
<tr><td rowspan="2" align="center" valign="top" width="1%"><img src="images/caution.png" alt="Caution" title="" width="24" height="24"></td></tr>
<tr><td colspan="2" align="left" valign="top" width="99%">Due to inherent similarities between certain encodings, some encodings may be detected incorrectly.  In my tests, the most problematic case was Hungarian text encoded as <tt class="literal">ISO-8859-2</tt> or <tt class="literal">windows-1250</tt> (encoded as one but reported as the other).  Also, Greek text encoded as <tt class="literal">ISO-8859-7</tt> was often mis-reported as <tt class="literal">ISO-8859-2</tt>.  Your mileage may vary.</td></tr>
</table>
</div>
<div class="footernavigation">
<div style="float: left">← <a class="NavigationArrow" href="faq.html">Frequently asked questions</a>
</div>
<div style="text-align: right">
<a class="NavigationArrow" href="usage.html">Usage</a> →</div>
</div>
<hr>
<div id="footer"><p class="copyright">Copyright © 2006, 2007, 2008 Mark Pilgrim · <a href="mailto:mark@diveintomark.org">mark@diveintomark.org</a> · <a href="license.html">Terms of use</a></p></div>
</div></div>
</body>
</html>
